yannic kilcher

[GRPO Explained] DeepSeekMath:

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Yannic Kilcher on

Yannic Kilcher on PhD's for ML #shorts

Scaling LLM Test-Time

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Were RNNs All

Were RNNs All We Needed? (Paper Explained)

Yannic Kilcher on

Yannic Kilcher on superintelligence #machineleaning

TokenFormer: Rethinking Transformer

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

My GitHub (Trash

My GitHub (Trash code I wrote during PhD)

Grokking: Generalization beyond

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

Attention Is All

Attention Is All You Need

Hallucination-Free? Assessing the

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Flow Matching for

Flow Matching for Generative Modeling (Paper Explained)

GSM-Symbolic: Understanding the

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

JEPA - A

JEPA - A Path Towards Autonomous Machine Intelligence (Paper Explained)

What is Q-Learning

What is Q-Learning (back to basics)

Safety Alignment Should

Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

Byte Latent Transformer:

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

Mamba: Linear-Time Sequence

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

An Image is

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

No, Anthropic's Claude

No, Anthropic's Claude 3 is NOT sentient

xLSTM: Extended Long

xLSTM: Extended Long Short-Term Memory

GPT-4chan: This is

GPT-4chan: This is the worst AI ever

Hopfield Networks is

Hopfield Networks is All You Need (Paper Explained)

OpenAI CLIP: ConnectingText

OpenAI CLIP: ConnectingText and Images (Paper Explained)